{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "GoogLeNet_implementation.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyPGQOcEBIPMnjcNfxjYUx1w",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Machine-Learning-Tokyo/CNN-Architectures/blob/master/Implementations/GoogLeNet/GoogLeNet_implementation.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gzUiWE5Up73Y",
        "colab_type": "text"
      },
      "source": [
        "## Implementation of GoogLeNet\n",
        "\n",
        "<br><span style=\"font-size:14pt;\">We will use the tensorflow.keras Functional API to build GoogLeNet</span>\n",
        "(https://arxiv.org/pdf/1409.4842.pdf)\n",
        "\n",
        "---\n",
        "\n",
        "In the paper we can read:\n",
        "\n",
        ">**[i]** “All the convolutions, including those inside the Inception modules, use rectified linear activation”\n",
        "\n",
        "<br>\n",
        "\n",
        "We will also use the following Table **[ii]**:\n",
        "\n",
        "<img src=https://raw.githubusercontent.com/Machine-Learning-Tokyo/DL-workshop-series/master/Part%20I%20-%20Convolution%20Operations/images/GoogleNet/GoogleNet.png width=\"700\">\n",
        "\n",
        "<br>\n",
        "\n",
        "as well the following Diagram **[iii]** of the Inception block:\n",
        "\n",
        "<img src=https://raw.githubusercontent.com/Machine-Learning-Tokyo/DL-workshop-series/master/Part%20I%20-%20Convolution%20Operations/images/GoogleNet/Inception_block.png width=\"300\">\n",
        "\n",
        "---\n",
        "\n",
        "## Network architecture\n",
        "\n",
        "In GoogleNet starts with two Conv-MaxPool blocks and then continues with a series of **Inception** blocks separated by *Max Pool* layers before the fineal *Fully Connected* layer.\n",
        "\n",
        "### Inception block\n",
        "\n",
        "The Inception block is depicted at **[iii]**.\n",
        "\n",
        "It takes as input a tensor and passes it through **4 different streams**:\n",
        "1. a 1x1 Conv layer\n",
        "2. a 1x1 Conv layer followed by a 3x3 Conv layer\n",
        "3. a 1x1 Conv layer followed by a 5x5 Conv layer\n",
        "4. a 3x3 Max Pool layer followed by a 1x1 Conv layer\n",
        "\n",
        "The output tensors of all four final Conv layers are **concatenated** to one tensor.\n",
        "\n",
        "---\n",
        "\n",
        "## Workflow\n",
        "We will:\n",
        "1. import the neccesary layers\n",
        "2. write a helper function for the inception_block()\n",
        "3. test inception_block()\n",
        "4. write the code for the stem of the model\n",
        "5. write the code for the main part (Inception blocks) of the model\n",
        "\n",
        "---\n",
        "\n",
        "### 1. Imports"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "R_GItf3pdcuy",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from tensorflow.keras.layers import Input, Conv2D, MaxPool2D, \\\n",
        "     Concatenate, AvgPool2D, Dropout, Flatten, Dense"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e-YeW15ZqJwa",
        "colab_type": "text"
      },
      "source": [
        "### 2. inception_block()\n",
        "Next, we will build the *Inception block* as a function that will:\n",
        "- take as inputs:\n",
        "  - a tensor (**`x`**)\n",
        "  - a list with the number of filters for each one of the 6 Convolutional layers of an Inception block (**`filters`**)\n",
        "- run:\n",
        "    - apply the structure of *Inception block* as described above\n",
        "- return the tensor\n",
        "\n",
        "and will return the concatenaded tensor **`output`**."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Pxeoe98nqXsW",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def inception_block(x, filters):\n",
        "    t1 = Conv2D(filters=filters[0], kernel_size=1, activation='relu')(x)\n",
        "\n",
        "    t2 = Conv2D(filters=filters[1], kernel_size=1, activation='relu')(x)\n",
        "    t2 = Conv2D(filters=filters[2], kernel_size=3, padding='same', activation='relu')(t2)\n",
        "\n",
        "    t3 = Conv2D(filters=filters[3], kernel_size=1, activation='relu')(x)\n",
        "    t3 = Conv2D(filters=filters[4], kernel_size=5, padding='same', activation='relu')(t3)\n",
        "\n",
        "    t4 = MaxPool2D(pool_size=3, strides=1, padding='same')(x)\n",
        "    t4 = Conv2D(filters=filters[5], kernel_size=1, activation='relu')(t4)\n",
        "\n",
        "    output = Concatenate()([t1, t2, t3, t4])\n",
        "    return output"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bulAKcRQqtka",
        "colab_type": "text"
      },
      "source": [
        "### 3. Test of Inception block\n",
        "Notice how we use the input tensor **`x`** as input to the 1st layer of each stream, but we use the output tensor of the 1st layer of a stream as input to the 2nd layer of the stream.\n",
        "e.g.:\n",
        "\n",
        "**Code:**\n",
        "```python\n",
        "t2 = Conv2D(filters=filters[1], kernel_size=1, activation='relu')(x)\n",
        "t2 = Conv2D(filters=filters[2], kernel_size=3, padding='same', activation='relu')(t2)\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gg1FYg4Bq-en",
        "colab_type": "text"
      },
      "source": [
        "We can build a simple model to visualize the result of this function:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Ys4dTTiGrA99",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "input = Input([224, 224, 3])\n",
        "output = inception_block(input, filters=[1, 1, 1, 1, 1, 1])\n",
        "\n",
        "from tensorflow.keras import Model\n",
        "model = Model(input, output)\n",
        "\n",
        "from tensorflow.keras.utils import plot_model\n",
        "plot_model(model)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lW-xyzAqrGf6",
        "colab_type": "text"
      },
      "source": [
        "In this example we used 1s for the number of filters of each convolution layer.\n",
        "\n",
        "<br/>\n",
        "\n",
        "In the actual model we will use the number written at the *Table*.\n",
        "\n",
        "The corresponding columns are:\n",
        "1. 1x1 Conv **(#1x1)**\n",
        "2. 1x1 Conv **(#3x3 reduce)** -> 3x3 Conv **(#3x3)**\n",
        "3. 1x1 Conv **(#5x5 reduce)** -> 5x5 Conv **(#5x5)**\n",
        "4. 3x3 Max Pool -> 1x1 Conv **(pool proj)**\n",
        "\n",
        "For example the filter numbers for `inception(3a)` will be:\n",
        "\n",
        "- 64, 96, 128, 16, 32, 32\n",
        "\n",
        "One last thing we have to notice before start coding the network is the convolution layer of the 3rd line of the *Table*.\n",
        "\n",
        "As one can see it has:\n",
        "- 64 filters at the **(#3x3 reduce)** column\n",
        "- 192 filters at the **(#3x3)** column\n",
        "\n",
        "<br/>\n",
        "\n",
        "This means that it actually consists of 2 Convolutional layers:\n",
        "- one with kernel size 1x1 and 64 filters\n",
        "- folowed by one with kernel size 3x3 and 192 filters\n",
        "\n",
        "### 4. Stem of the model\n",
        "#### Part 1\n",
        "\n",
        "From the *Table*:\n",
        "\n",
        "\n",
        "| type        \t|  patch size / stride \t| output size |\n",
        "|-------------\t|:--------------------:\t|:-----------:|\n",
        "| convolution \t|        7x7 / 2       \t| 112x112x64  |\n",
        "| max pool    \t|        3x3 / 2       \t| 56x56x64    |"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_pGeqY8PrWuw",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "input = Input(shape=(224, 224, 3))\n",
        "x = Conv2D(filters=64, kernel_size=7, strides=2, padding='same', activation='relu')(input)\n",
        "x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AoNTswjxrWJE",
        "colab_type": "text"
      },
      "source": [
        "#### Part 2\n",
        "\n",
        "From the *Table*:\n",
        "\n",
        "\n",
        "| type        \t|  patch size / stride \t| #3x3 reduce \t| #3x3 \t|\n",
        "|-------------\t|:--------------------:\t|:-----------:\t|:----:\t|\n",
        "| convolution \t|        3x3 / 1       \t| 64          \t| 192  \t|\n",
        "| max pool    \t|        3x3 / 2       \t|             \t|      \t|\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RksblxW8rj5N",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = Conv2D(filters=64, kernel_size=1, activation='relu')(x)\n",
        "x = Conv2D(filters=192, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=3, strides=2)(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IroNqvSGrd8H",
        "colab_type": "text"
      },
      "source": [
        "### 5. Main part of the model\n",
        "\n",
        "#### Part 3\n",
        "From the *Table*:\n",
        "\n",
        "\n",
        "| type          \t|  patch size / stride \t| #1x1 \t| #3x3 reduce \t| #3x3 \t| #5x5 reduce \t| #5x5 \t| pool proj \t|\n",
        "|---------------\t|:--------------------:\t|:----:\t|:-----------:\t|:----:\t|:-----------:\t|:----:\t|:---------:\t|\n",
        "| inception(3a) \t|                      \t| 64   \t| 96          \t| 128  \t| 16          \t| 32   \t| 32        \t|\n",
        "| inception(3b) \t|                      \t| 128  \t| 128         \t| 192  \t| 32          \t| 96   \t| 64        \t|\n",
        "| max pool      \t|        3x3 / 2       \t|      \t|             \t|      \t|             \t|      \t|           \t|\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "C1oNucmUroxV",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = inception_block(x, filters=[64, 96, 128, 16, 32, 32])\n",
        "x = inception_block(x, filters=[128, 128, 192, 32, 96, 64])\n",
        "x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0SVgHrtMrsve",
        "colab_type": "text"
      },
      "source": [
        "#### Part 4\n",
        "From the *Table*:\n",
        "\n",
        "\n",
        "| type          \t|  patch size / stride \t| #1x1 \t| #3x3 reduce \t| #3x3 \t| #5x5 reduce \t| #5x5 \t| pool proj \t|\n",
        "|---------------\t|:--------------------:\t|:----:\t|:-----------:\t|:----:\t|:-----------:\t|:----:\t|:---------:\t|\n",
        "| inception(4a) \t|                      \t| 192  \t| 96          \t| 208  \t| 16          \t| 48   \t| 64        \t|\n",
        "| inception(4b) \t|                      \t| 160  \t| 112         \t| 224  \t| 24          \t| 64   \t| 64        \t|\n",
        "| inception(4c) \t|                      \t| 128  \t| 128         \t| 256  \t| 24          \t| 64   \t| 64        \t|\n",
        "| inception(4d) \t|                      \t| 112  \t| 144         \t| 288  \t| 32          \t| 64   \t| 64        \t|\n",
        "| inception(4e) \t|                      \t| 256  \t| 160         \t| 320  \t| 32          \t| 128  \t| 128       \t|\n",
        "| max pool      \t|        3x3 / 2       \t|      \t|             \t|      \t|             \t|      \t|           \t|\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BGh5qTqkru5S",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = inception_block(x, filters=[192, 96, 208, 16, 48, 64])\n",
        "x = inception_block(x, filters=[160, 112, 224, 24, 64, 64])\n",
        "x = inception_block(x, filters=[128, 128, 256, 24, 64, 64])\n",
        "x = inception_block(x, filters=[112, 144, 288, 32, 64, 64])\n",
        "x = inception_block(x, filters=[256, 160, 320, 32, 128, 128])\n",
        "x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ju2hpHPCr0Jk",
        "colab_type": "text"
      },
      "source": [
        "#### Part 5\n",
        "From the *Table*:\n",
        "\n",
        "\n",
        "| type          \t|  patch size / stride \t| #1x1 \t| #3x3 reduce \t| #3x3 \t| #5x5 reduce \t| #5x5 \t| pool proj \t|\n",
        "|---------------\t|:--------------------:\t|:----:\t|:-----------:\t|:----:\t|:-----------:\t|:----:\t|:---------:\t|\n",
        "| inception(5a) \t|                      \t| 256  \t| 160         \t| 320  \t| 32          \t| 128  \t| 128       \t|\n",
        "| inception(5b) \t|                      \t| 384  \t| 192         \t| 384  \t| 48          \t| 128  \t| 128       \t|\n",
        "| avg pool      \t|        7x7 / 1       \t|      \t|             \t|      \t|             \t|      \t|           \t|\n",
        "| dropout(40%)  \t|                      \t|      \t|             \t|      \t|             \t|      \t|           \t|\n",
        "| linear        \t|                      \t|      \t|             \t|      \t|             \t|      \t|           \t|\n",
        "| softmax       \t|                      \t|      \t|             \t|      \t|             \t|      \t|           \t|\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "pBcFS2AUr1pj",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = inception_block(x, filters=[256, 160, 320, 32, 128, 128])\n",
        "x = inception_block(x, filters=[384, 192, 384, 48, 128, 128])\n",
        "x = AvgPool2D(pool_size=7, strides=1)(x)\n",
        "x = Dropout(rate=0.4)(x)\n",
        "\n",
        "x = Flatten()(x)\n",
        "output = Dense(units=1000, activation='softmax')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GMvlmdn7VWLs",
        "colab_type": "text"
      },
      "source": [
        "### Model definition"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4I-6wbIHVMw5",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from tensorflow.keras import Model\n",
        "\n",
        "model = Model(inputs=input, outputs=output)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5nQ1VX0ir5jr",
        "colab_type": "text"
      },
      "source": [
        "## Final code"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "bjdnCv4FsAtA",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from tensorflow.keras.layers import Input, Conv2D, MaxPool2D, \\\n",
        "     Concatenate, AvgPool2D, Dropout, Flatten, Dense\n",
        "      \n",
        "def inception_block(x, filters):\n",
        "    t1 = Conv2D(filters=filters[0], kernel_size=1, activation='relu')(x)\n",
        "\n",
        "    t2 = Conv2D(filters=filters[1], kernel_size=1, activation='relu')(x)\n",
        "    t2 = Conv2D(filters=filters[2], kernel_size=3, padding='same', activation='relu')(t2)\n",
        "\n",
        "    t3 = Conv2D(filters=filters[3], kernel_size=1, activation='relu')(x)\n",
        "    t3 = Conv2D(filters=filters[4], kernel_size=5, padding='same', activation='relu')(t3)\n",
        "\n",
        "    t4 = MaxPool2D(pool_size=3, strides=1, padding='same')(x)\n",
        "    t4 = Conv2D(filters=filters[5], kernel_size=1, activation='relu')(t4)\n",
        "\n",
        "    output = Concatenate()([t1, t2, t3, t4])\n",
        "    return output\n",
        "\n",
        "\n",
        "input = Input(shape=(224, 224, 3))\n",
        "x = Conv2D(filters=64, kernel_size=7, strides=2, padding='same', activation='relu')(input)\n",
        "x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)\n",
        "\n",
        "x = Conv2D(filters=64, kernel_size=1, activation='relu')(x)\n",
        "x = Conv2D(filters=192, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=3, strides=2)(x)\n",
        "\n",
        "x = inception_block(x, filters=[64, 96, 128, 16, 32, 32])\n",
        "x = inception_block(x, filters=[128, 128, 192, 32, 96, 64])\n",
        "x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)\n",
        "\n",
        "x = inception_block(x, filters=[192, 96, 208, 16, 48, 64])\n",
        "x = inception_block(x, filters=[160, 112, 224, 24, 64, 64])\n",
        "x = inception_block(x, filters=[128, 128, 256, 24, 64, 64])\n",
        "x = inception_block(x, filters=[112, 144, 288, 32, 64, 64])\n",
        "x = inception_block(x, filters=[256, 160, 320, 32, 128, 128])\n",
        "x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)\n",
        "\n",
        "x = inception_block(x, filters=[256, 160, 320, 32, 128, 128])\n",
        "x = inception_block(x, filters=[384, 192, 384, 48, 128, 128])\n",
        "x = AvgPool2D(pool_size=7, strides=1)(x)\n",
        "x = Dropout(rate=0.4)(x)\n",
        "\n",
        "x = Flatten()(x)\n",
        "output = Dense(units=1000, activation='softmax')(x)\n",
        "\n",
        "from tensorflow.keras import Model\n",
        "\n",
        "model = Model(inputs=input, outputs=output)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZkB8eh0CqYef",
        "colab_type": "text"
      },
      "source": [
        "## Model diagram\n",
        "\n",
        "<img src=\"https://raw.githubusercontent.com/Machine-Learning-Tokyo/CNN-Architectures/master/Implementations/GoogLeNet/GoogLeNet_diagram.svg?sanitize=true\">"
      ]
    }
  ]
}